Entity Resolution and Tracking on Social Networks a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

نویسندگان

  • Norases Vesdapunt
  • Andreas Paepcke
چکیده

In this thesis we study two interesting aspects of the problem of Entity Resolution (ER). The goal of ER is to identify and merge records that refer to the same underlying entity. The recent rise in adoption of social networks (Facebook, Google+, Twitter, and others) introduces new issues and twists to the traditional ER problem: crowdsourcing and limited information. We first study a hybrid human-machine approach to solving ER problems. Machine learning models can predict the probabilities of entity pairs referring to the same entity. However, machines make mistakes. Humans can help verify the equality of entity pairs, and social systems like Facebook allow users to help resolve entities on their platforms. We propose hybrid human-machine strategies with theoretical guarantees that leverage transitivity relations (e.g. a = c can be inferred given a = b and b = c). Next, we study the problem of ER with limited information. Social systems impose limits on API calls that constrain access to their full social graphs. We focus on the resolution of a single node g from one social graph G against a second social graph T . We want to find the best match for g in T , by dynamically probing T (using a public API), limited by the number of API calls that these social systems allow. We propose two ER strategies that are designed for limited information and can be adapted to different API limits. Finally, we study the problem of updating social graph snapshots when one has limited information. Effective social network ER requires up-to-date snapshots. Limited by the number of API calls that social systems allow, we seek to efficiently update a snapshot. We want to avoid re-crawling all of the nodes and minimize the number of API calls. We propose novel snapshot update strategies that are designed for limited information and can be adapted to different levels of staleness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Gaze-enhanced User Interface Design a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

........................................................................................................ iv Acknowledgments ..................................................................................... vi

متن کامل

Structuring Peer Interactions for Massive Scale Learning a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

....................................................................................................................... iv Acknowledgments ........................................................................................................ vi Table of

متن کامل

Haptics and Physical Simulation for Virtual Bone Surgery a Dissertation Submitted to the Department of Computer Science and the Committee on Graduate Studies of Stanford University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

......................................................................................................... iv Acknowledgments .......................................................................................... vi

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016